TELLTALE: Experiments in a Dynamic Hypertext Environment for Degraded and Multilingual Data
نویسندگان
چکیده
Methods and tools for finding documents relevant to a user’s needs in document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static corpora, their algorithms are dependent on the language for which they are written, e.g. English, and they do not perform well when presented with misspelled words or text that has been degraded by OCR (optical character recognition) techniques. In this article, we present experimentation results for the TELLTALE system. TELLTALE is a dynamic hypertext environment that provides full-text search from a hypertext-style user interface for text corpora that may be garbled by OCR or transmission errors, and that may contain languages other than English. TELLTALE uses several techniques based on ngrams (n character sequences of text). With these results we show that the dynamic linkage mechanisms in TELLTALE are tolerant of garbles in up to 30% of the characters in the body of the text.
منابع مشابه
The TELLTALE Dynamic Hypertext Environment: Approaches to Scalability
Methods and tools for nding documents relevant to a user's needs in document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static corpora, their algorithms are dependent on the language for which they are written, e.g. English, and they don't perform well when presented with missp...
متن کاملThe TELLTALE Dynamic Hypertext Environment : Approaches to
Methods and tools for nding documents relevant to a user's needs in document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static corpora, their algorithms are dependent on the language for which they are written, e.g. English, and they don't perform well when presented with missp...
متن کاملPerformance and Scalability of a Large-Scale N-gram Based Information Retrieval System
Information retrieval has become more and more important due to the rapid growth of all kinds of information. However, there are few suitable systems available. This paper presents a few approaches that enable large-scale information retrieval for the TELLTALE system. TELLTALE is a dynamic hypertext information retrieval environment. It provides full-text search for text corpora that may be gar...
متن کاملOptimal overhaul–replacement policy for a multi-degraded repairable system sold with warranty
In this research, we study an optimal overhaul–replacement policy of a multi-degraded repairable system sold with a free replacement warranty. In the proposed replacement policy, a maintenance action and failure are dependent on a system degradation level and the system age, and hence the replacement model will provide more effective maintenance decisions. Failure of the system is modeled using...
متن کاملSafety Assessment of cryIAc for human, animals and environment
Risk assessment of a transgene is one of the key steps in genetic transformation. Hence, in order to use cryIAc gene for production of transgenic plants, a library and in-silico research was performed to confirm safety of the gene for human comsumption, animal feed and environment. In the first step, the molecular mechanism of action of the CryIAc protein and its specific receptors in the midgu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JASIS
دوره 47 شماره
صفحات -
تاریخ انتشار 1996